AITopics | protein structure prediction

Collaborating Authors

protein structure prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

materials

Neural Information Processing SystemsApr-24-2026, 20:49:33 GMT

A.1 Access instructions OpenProteinSet is hosted by the Registry of Open Data on AWS (RODA) and can be accessed at the following link: registry.opendata.aws/openfold/. A.2 Documentation and intended uses We include a datasheet [1] in Section B. Detailed documentation on the precise structure and content of the dataset is provided on the dataset's landing page. A.3 Data format All OpenProteinSet files are in standard plaintext formats (A3M for MSAs, HHSearch format for template hits, and PDB for structure files) that can be read by a wide variety of bioinformatics software. A.5 License OpenProteinSet is made available under the CCBY 4.0 license. A copy of the license is provided with the dataset.

artificial intelligence, bioinformatics, dataset, (15 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.72)
Law (0.47)

Technology:

Information Technology > Artificial Intelligence (0.70)
Information Technology > Biomedical Informatics (0.50)

Add feedback

OpenProteinSet: Training data for structural biology at scale

Neural Information Processing SystemsApr-24-2026, 20:49:30 GMT

Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.

artificial intelligence, bioinformatics, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

Neural Information Processing SystemsMar-19-2026, 22:43:40 GMT

Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high-quality MSA. Although various methods have been proposed to generate high-quality MSA under these conditions, they fall short in comprehensively capturing the intricate co-evolutionary patterns within MSA or require guidance from external oracle models. Here we introduce MSAGPT, a novel approach to prompt protein structure predictions via MSA generative pre-training in a low-MSA regime. MSAGPT employs a simple yet effective 2D evolutionary positional encoding scheme to model the complex evolutionary patterns.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

694be3548697e9cc8999d45e8d16fe1e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 13:38:13 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > Canada > Quebec (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Supplementary materials

Neural Information Processing SystemsFeb-7-2026, 21:03:58 GMT

A.2 Documentation and intended uses We include a datasheet [1] in Section B. Detailed documentation on the precise structure and content OpenProteinSet is made available under the CC BY 4.0 license. The authors bear all responsibility in case of violation of rights. OpenProteinSet will continue to be hosted on RODA for the foreseeable future. A.7 Alignment tool settings For JackHMMer, we used -N 1 -E 0.0001 -incE 0.0001 -F1 0.0005 -F2 0.00005 -F3 0.0000005 and then capped outputs at depth 5000. B.1 Motivation For what purpose was the dataset created?

artificial intelligence, dataset, openproteinset, (14 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.72)
Law (0.47)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

OpenProteinSet: Training data for structural biology at scale

Neural Information Processing SystemsFeb-7-2026, 21:03:55 GMT

Each row of an MSA is a protein sequence.

bioinformatics, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

OpenProteinSet: Training data for structural biology at scale

Neural Information Processing SystemsDec-23-2025, 21:57:39 GMT

openproteinset, structural biology, training data, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

From Static Structures to Ensembles: Studying and Harnessing Protein Structure Tokenization

Liu, Zijing, Feng, Bin, Cao, He, Li, Yu

arXiv.org Artificial IntelligenceNov-14-2025

Protein structure tokenization converts 3D structures into discrete or vectorized representations, enabling the integration of structural and sequence data. Despite many recent works on structure tokenization, the properties of the underlying discrete representations are not well understood. In this work, we first demonstrate that the successful utilization of structural tokens in a language model for structure prediction depends on using rich, pre-trained sequence embeddings to bridge the semantic gap between the sequence and structural "language". The analysis of the structural vocabulary itself then reveals significant semantic redundancy, where multiple distinct tokens correspond to nearly identical local geometries, acting as "structural synonyms". This redundancy, rather than being a flaw, can be exploited with a simple "synonym swap" strategy to generate diverse conformational ensembles by perturbing a predicted structure with its structural synonyms. This computationally lightweight method accurately recapitulates protein flexibility, performing competitively with state-of-the-art models. Our study provides fundamental insights into the nature of discrete protein structure representations and introduces a powerful, near-instantaneous method for modeling protein dynamics. Source code is available in https://github.com/IDEA-XL/TokenMD.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.10056

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Role of AI in Facilitating Interdisciplinary Collaboration: Evidence from AlphaFold

Zhao, Naixuan, Wei, Chunli, Zhang, Xinyan, Li, Jiang

arXiv.org Artificial IntelligenceOct-28-2025

The acceleration of artificial intelligence (AI) in science is recognized and many scholars have begun to explore its role in interdisciplinary collaboration. However, the mechanisms and extent of this impact are still unclear. This study, using AlphaFold's impact on structural biologists, examines how AI technologies influence interdisciplinary collaborative patterns. By analyzing 1,247 AlphaFold-related papers and 7,700 authors from Scopus, we employ bibliometric analysis and causal inference to compare interdisciplinary collaboration between AlphaFold adopters and non-adopters. Contrary to the widespread belief that AI facilitates interdisciplinary collaboration, our findings show that AlphaFold increased structural biology-computer science collaborations by just 0.48%, with no measurable effect on other disciplines. Specifically, AI creates interdisciplinary collaboration demands with specific disciplines due to its technical characteristics, but this demand is weakened by technological democratization and other factors. These findings demonstrate that artificial intelligence (AI) alone has limited efficacy in bridging disciplinary divides or fostering meaningful interdisciplinary collaboration.

data mining, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.13234

Country: Asia > China (0.28)

Genre: